Enriching Text-to-Speech Synthesis Using Automatic Dialog Act Tags
نویسندگان
چکیده
We present an approach for enriching dialog based textto-speech (TTS) synthesis systems by explicitly controlling the expressiveness through the use of dialog act tags. The dialog act tags in our framework are automatically obtained by training a maximum entropy classifier on the Switchboard-DAMSL data set, unrelated to the TTS database. We compare the voice quality produced by exploiting automatic dialog act tags with that using human annotations of dialog acts, and with two forms of reference databases. Even though the inventory of tags is different for the automatic tagger and human annotation, exploiting either form of dialog markup generates better voice quality in comparison with the reference voices in subjective evaluation.
منابع مشابه
Enriching machine-mediated speech-to-speech translation using contextual information
Conventional approaches to speech-to-speech (S2S) translation typically ignore key contextual information such as prosody, emphasis, discourse state in the translation process. Capturing and exploiting such contextual information is especially important in machine-mediated S2S translation as it can serve as a complementary knowledge source that can potentially aid the end users in improved unde...
متن کاملAuthoring tools for speech synthesis using the sable markup standard
In text-to-speech (TTS) synthesis, input text is automatically analyzed. This involves prediction of pronunciation, intonation, and timing at segmental and phrase level. In the design of dialog applications, developers need more control over the text-to-speech conversion. While the automatic analysis is often unsatisfactory, the developer can easily provide hints that improve the synthetic spee...
متن کاملBackoff Model Training using Partially Observed Data: Application to Dialog Act Tagging
Dialog act (DA) tags are useful for many applications in natural language processing and automatic speech recognition. In this work, we introduce hidden backoff models (HBMs) where a large generalized backoff model is trained, using an embedded expectation-maximization (EM) procedure, on data that is partially observed. We use HBMs as word models conditioned on both DAs and (hidden) DAsegments....
متن کاملNatural vs. Synthesized Speech in Spoken Dialog Systems Research ?? Comparing the Performance of Recognition Results
In this paper, we test the effect of using speech synthesis when interacting with a spoken dialog system (SDS). We use a user simulation to connect our speech synthesis to a real, state-of-the-art automatic speech recognition (ASR) component deployed in a working commercial SDS via a standard telephone line. In a series of experiments, we compare human-machine dialogs and their recognition scor...
متن کاملSpeech acts and dialog TTS
The approach outlined in this paper aims to provide better expressivity of unit selection TTS for dialog intended applications while retaining the natural sounding voice quality typical of unit selection synthesis. A small set of speech acts were used to annotate a corpus from one female US English speaker. The corpus was composed of speech read primarily from interactive dialogs of various kin...
متن کامل